TSM - Assignment.

Collect the data for real and nominal yields from the Federal Reserve covering the period Jan 2005-August 2022.

In [ ]:
import pandas as pd
from helpers import *
import plotly.graph_objects as go
import plotly.express as px
from statsmodels.tsa.stattools import adfuller
import chart_studio.plotly as py
import plotly.io as pio
import plotly.offline as pyo
import numpy as np

#
pio.renderers.default = "notebook"
# collect data from files
nominal_yields = pd.read_excel('nominal_yields.xlsx').dropna()
real_yields = pd.read_excel('real_yields.xlsx').dropna()
In [ ]:
# show the first 5 rows of nominal yields
nominal_yields.head()
Out[ ]:
Date BETA0 BETA1 BETA2 BETA3 SVEN1F01 SVEN1F04 SVEN1F09 SVENF01 SVENF02 ... SVENY23 SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30 TAU1 TAU2
0 2005-01-03 6.189826e-06 2.277828 2.659254 16.057365 3.4022 4.3567 5.7028 3.1680 3.5462 ... 5.0962 5.0874 5.0729 5.0535 5.0295 5.0014 4.9697 4.9348 1.436363 13.431858
1 2005-01-04 2.549831e-09 2.259019 3.046883 16.223499 3.5231 4.4071 5.7558 3.2872 3.6552 ... 5.1584 5.1495 5.1350 5.1155 5.0913 5.0631 5.0312 4.9960 1.393715 13.471621
2 2005-01-05 4.330379e-05 2.259806 3.089485 16.079062 3.5391 4.3983 5.7099 3.3009 3.6710 ... 5.1227 5.1131 5.0981 5.0780 5.0535 5.0249 4.9928 4.9574 1.412061 13.445450
3 2005-01-06 1.834049e-12 2.257302 2.977634 16.110743 3.5090 4.4031 5.7234 3.2678 3.6475 ... 5.1278 5.1183 5.1032 5.0831 5.0586 5.0299 4.9977 4.9623 1.423321 13.430211
4 2005-01-07 1.780984e-12 2.304075 2.956248 16.007722 3.5372 4.4163 5.6898 3.2920 3.6785 ... 5.1104 5.1006 5.0854 5.0651 5.0405 5.0118 4.9796 4.9442 1.476347 13.455895

5 rows × 100 columns

In [ ]:
# show the first 5 rows of real yields
real_yields.head()
Out[ ]:
Date BETA0 BETA1 BETA2 BETA3 BKEVEN02 BKEVEN03 BKEVEN04 BKEVEN05 BKEVEN06 ... TIPSY11 TIPSY12 TIPSY13 TIPSY14 TIPSY15 TIPSY16 TIPSY17 TIPSY18 TIPSY19 TIPSY20
0 2005-01-03 -1.504322 1.565163 -3167.402774 3176.711476 2.5935 2.5925 2.5779 2.5709 2.5767 ... 1.8002 1.8722 1.9313 1.9784 2.0144 2.0401 2.0565 2.0643 2.0644 2.0575
1 2005-01-04 -0.973675 1.184436 -3682.065233 3690.256864 2.5802 2.5909 2.5779 2.5701 2.5747 ... 1.8629 1.9333 1.9910 2.0369 2.0718 2.0967 2.1123 2.1197 2.1196 2.1127
2 2005-01-05 -0.692399 0.976122 -3186.398677 3193.908160 2.5502 2.5701 2.5611 2.5536 2.5569 ... 1.8764 1.9459 2.0028 2.0479 2.0821 2.1062 2.1213 2.1281 2.1276 2.1205
3 2005-01-06 -0.344569 0.609703 -1901.199337 1907.956756 2.5499 2.5726 2.5662 2.5605 2.5645 ... 1.8695 1.9399 1.9974 2.0428 2.0772 2.1015 2.1167 2.1237 2.1234 2.1168
4 2005-01-07 -0.535426 0.854999 -2554.392640 2561.606742 2.5209 2.5466 2.5423 2.5372 2.5405 ... 1.9012 1.9703 2.0267 2.0715 2.1053 2.1292 2.1441 2.1509 2.1504 2.1435

5 rows × 127 columns

In [ ]:
# select specific maturities
nominal_yields_2_10y_eom = selectYieldsMaturities(nominal_yields,type='nominal',FD=False)
real_yields_2_10y_eom = selectYieldsMaturities(real_yields,type='real',FD=False)
In [ ]:
# plot nominal data
plotYield(nominal_yields_2_10y_eom,columns=['SVENY02','SVENY03','SVENY05','SVENY07','SVENY10'],type='Nominal',FD=False)
In [ ]:
# plot real yield data (TIPS)
plotYield(real_yields_2_10y_eom,columns=['TIPSY02','TIPSY03','TIPSY05','TIPSY07','TIPSY10'],type='Real',FD=False)

We have extracted and plotted the data, but before proceeding with PCA, it is usually well-practice to ensure stationarity of the data under consideration. From the figures of nominal and real yields respectively, some clear trends which for the statistical inference of data analysis should be removed (or limited as much as possible) for a correct interpretation of the results.

In [ ]:
# check for stationarity in nominal yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='2y')
ADF Statistic: -1.784198
p-value: 0.388335
Critical Values:
	1%: -3.476
	5%: -2.881
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-1.7841977737615378, 0.3883348001847712)
In [ ]:
# let's try for 5y yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='5y')
ADF Statistic: -1.843918
p-value: 0.358886
Critical Values:
	1%: -3.475
	5%: -2.881
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-1.8439183137699224, 0.3588855951201053)
In [ ]:
# let's try for 10y yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='10y')
ADF Statistic: -1.902721
p-value: 0.330782
Critical Values:
	1%: -3.475
	5%: -2.881
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-1.9027209249148123, 0.3307821864201821)
In [ ]:
# nominal yields are non-stationary, so we re-take the original data, apply first difference and then take only the month end
nominal_yields_2_10y_eom_shifted = selectYieldsMaturities(nominal_yields,type='nominal',FD=True)
In [ ]:
# check stationarity after first difference (only for the 2y)
ADFtest(nominal_yields_2_10y_eom_shifted,type='nominal',maturity='2y')
ADF Statistic: -7.123238
p-value: 0.000000
Critical Values:
	1%: -3.477
	5%: -2.882
	10%: -2.578
Stationary timeserie
Out[ ]:
(-7.123237767881004, 3.6760004763770164e-10)
In [ ]:
# let's plot the nominal yields stationary data
plotYield(nominal_yields_2_10y_eom_shifted,columns=['SVENY02','SVENY03','SVENY05','SVENY07','SVENY10'],type='Nominal',FD=True)
In [ ]:
# check for stationarity in real yields
ADFtest(real_yields_2_10y_eom,type='real',maturity='2y')
ADF Statistic: -2.347080
p-value: 0.157233
Critical Values:
	1%: -3.476
	5%: -2.881
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-2.347080382879149, 0.15723267907692623)
In [ ]:
# let's try with 5y
ADFtest(real_yields_2_10y_eom,type='real',maturity='5y')
ADF Statistic: -2.058364
p-value: 0.261583
Critical Values:
	1%: -3.475
	5%: -2.881
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-2.05836398030184, 0.2615826889940359)
In [ ]:
# let's try with 10y
ADFtest(real_yields_2_10y_eom,type='real',maturity='10y')
ADF Statistic: -1.859705
p-value: 0.351248
Critical Values:
	1%: -3.476
	5%: -2.882
	10%: -2.577
Non-Stationary timeserie
Out[ ]:
(-1.8597053177414344, 0.351247809531534)
In [ ]:
# real yields are non-stationary, so we re-take the original data, apply first difference and then take only the month end
real_yields_2_10y_eom_shifted = selectYieldsMaturities(real_yields,type='real',FD=True)
In [ ]:
# check stationarity after first difference (only for the 2y)
ADFtest(real_yields_2_10y_eom_shifted,type='real',maturity='2y')
ADF Statistic: -13.979960
p-value: 0.000000
Critical Values:
	1%: -3.475
	5%: -2.881
	10%: -2.577
Stationary timeserie
Out[ ]:
(-13.97996046052464, 4.194864820791476e-26)
In [ ]:
# let's plot the nominal yields stationary data
plotYield(real_yields_2_10y_eom_shifted,columns=['TIPSY02','TIPSY03','TIPSY05','TIPSY07','TIPSY10'],type='Real',FD=True)

Now we have done all the preliminary analysis entailing first differencing the data to ensure stationarity. At this point, we can start with the principal components analysis (PCA) using the stationary data.

In [ ]:
# PCA